Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora
نویسندگان
چکیده
منابع مشابه
Cross-Language Information Retrieval with Latent Topic Models Trained on a Comparable Corpus
In this paper we study cross-language information retrieval using a bilingual topic model trained on comparable corpora such as Wikipedia articles. The bilingual Latent Dirichlet Allocation model (BiLDA) creates an interlingual representation, which can be used as a translation resource in many different multilingual settings as comparable corpora are available for many language pairs. The prob...
متن کاملComparable Corpora in Cross-Language Information Retrieval
Cross-language information retrieval (CLIR) enables users to express queries in a language different from the language of the documents to be retrieved. For example, a Finnish-speaking person could pose a query to a CLIR system in Finnish (the source language) to retrieve documents written in English (the target language). The language barrier is usually crossed by translating the query into th...
متن کاملIdentifying Word Translations from Comparable Corpora Using Latent Topic Models
A topic model outputs a set of multinomial distributions over words for each topic. In this paper, we investigate the value of bilingual topic models, i.e., a bilingual Latent Dirichlet Allocation model for finding translations of terms in comparable corpora without using any linguistic resources. Experiments on a document-aligned English-Italian Wikipedia corpus confirm that the developed meth...
متن کاملExplicit vs. Latent Concept Models for Cross-Language Information Retrieval
The field of information retrieval and text manipulation (classification, clustering) still strives for models allowing semantic information to be folded in to improve performance with respect to standard bag-of-word based models. Many approaches aim at a concept-based retrieval, but differ in the nature of the concepts, which range from linguistic concepts as defined in lexical resources such ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Retrieval
سال: 2012
ISSN: 1386-4564,1573-7659
DOI: 10.1007/s10791-012-9200-5